NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Matching the Statistical Query Lower Bound for k-Sparse Parity Problems with Sign Stochastic Gradient Descent

Kou, Yiwen; Chen, Zixiang; Gu, Quanquan; Kakade, Sham M (December 2024, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
SOAP: Improving and Stabilizing Shampoo using Adam

Vyas, Nikhil; Morwani, Depen; Zhao, Rosie; Shapira, Itai; Brandfonbrener, David; Janson, Lucas; Kakade, Sham (December 2024, Neural Information Processing Systems Workshop: Optimization for Machine Learning)

Full Text Available
Scaling Laws in Linear Regression: Compute, Parameters, and Data

Lin, Licong; Wu, Jingfeng; Kakade, Sham M; Bartlett, Peter L; Lee, Jason D (December 2024, Advances in neural information processing systems)

Full Text Available
A Study on the Calibration of In-context Learning

Zhang, Hanlin; Zhang, YiFan; Yu, Yaodong; Madeka, Dhruv; Foster, Dean; Xing, Eric; Lakkaraju, Himabindu; Kakade, Sham (June 2024, : Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))
Duh, Kevin; Gomez, Helena; Bethard, Steven (Ed.)
Full Text Available
A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Perdomo, Juan; Krishnamurthy, Akshay; Bartlett, Peter L.; Kakade, Sham (October 2023, Journal of machine learning research)

Full Text Available
Benign overfitting of constant-stepsize SGD for linear regression

Zou, Difan; Wu, Jingfeng; Braverman, Vladimir; Gu, Quanquan; Kakade, Sham (October 2023, Journal of machine learning research)

Full Text Available
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Wu, Jingfeng; Zou, Difan; Chen, Zixiang; Braverman, Vladimir; Gu, Quanquan; Kakade, Sham (October 2023, International Conference on Machine Learning,)

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:37919-37951, 2023.
more » « less
Full Text Available
DataComp-LM: In search of the next generation of training sets for language models

Li, Jeffrey; Fang, Alex; Smyrnis, Georgios; Ivgi, Maor; Jordan, Matt; Gadre, Samir; Bansal, Hritik; Guha, Etash; Keh, Sedrick; Arora, Kushal; et al (April 2025, https://doi.org/10.48550/arXiv.2406.11794)

The authors introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments aimed at improving language models. DCLM provides a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants can experiment with dataset curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline, the authors find that model-based filtering is critical for assembling a high-quality training set. Their resulting dataset, DCLM-Baseline, enables training a 7B parameter model from scratch to achieve 64% 5-shot accuracy on MMLU with 2.6T training tokens. This represents a 6.6 percentage point improvement over MAP-Neo (the previous state-of-the-art in open-data LMs), while using 40% less compute. The baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% and 66%), and performs similarly on an average of 53 NLU tasks, while using 6.6x less compute than Llama 3 8B. These findings emphasize the importance of dataset design for training LMs and establish a foundation for further research on data curation.
more » « less
Free, publicly-accessible full text available April 21, 2026
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Wu, Jingfeng; Zou, Difan; Braverman, Vladimir; Gu, Quanquan; Kakade, Sham (January 2022, Proceedings of Machine Learning Research)

Full Text Available
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Wu, Jingfeng; Zou, Difan; Braverman, Vladimir; Gu, Quanquan; Kakade, Sham M. (January 2022, Advances in neural information processing systems)

Full Text Available

« Prev Next »

Search for: All records